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ABSTRACT 


An  in-depth  study  of  29  time-series  current-meter  records 
shows  that  the  logarithmic-speed  distributions  as  a  group  can 
be  considered  to  plot  symmetrically  about  their  mean,  and 
that  the  distribution  of  the  mean  appears  log-normal.   The 
mean  distribution  does,  however,  exhibit  a  slight  systematic 
deviation  possibly  due  to  transient  phenomena  in  the  data. 
Fifty  drift-of-ship  records  from  the  National  Oceanographic 
Data  Center  were  examined  and  found  (after  a  necessary  data 
alteration)  to  show  the  same  distributional  characteristics 
as  the  current-meter  data.   Indications  from  drift-of-ship 
data  were  that  area  and  seasonal  influences  affect  the  speed 
variability  but  not  the  distributional  characteristics  of 
the  logarithmic-speed  transformation.   The  log-normal  dis- 
tribution for  both  current-meter  and  corrected  drift-of-ship 
data  appears  to  be  useful  out  to  a  deviation  from  the  mean 
of  between  two  and  three  sigma  units. 
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I.   INTRODUCTION 

A.  GENERAL 

The  broad  general  subject  area  of  ocean  currents  has  been 
the  recipient  of  increasing  investigational  efforts  and  mone- 
tary expenditures.   Research  has  encompassed  the  spectrum  of 
ocean-current  related  subjects  from  the  enormous  task  of  de- 
termining the  effects  should  a  whole  current  system  (Gulf 
Stream,  for  instance)  be  diverted,  to  studying  the  effects 
currents  in  the  ocean  have  on  the  growth  and  decay  of  den- 
sity/salinity microstructure  in  specific  localities.   A 
sound  understanding  of  all  facets  of  ocean  currents  will  no 
doubt  prove  to  be  a  large  step  forward  in  completing  man's 
knowledge  of  the  oceans.   One  facet  of  ocean  currents  which 
has  received  limited  attention  in  research  studies  is  the 
statistical  properties  of  measured  ocean-current  speeds. 
This  is  the  specific  subject  area  covered  in  this  report. 

B.  MEASUREMENT  OF  CURRENT  VELOCITIES  AND  SUBSEQUENT  STATIS- 
TICAL STUDIES 

1 .   Ways  of  Measuring  Ocean-Current  Velocities 

Two  means  have  existed  for  directly  determining  cur- 
rent velocities.   One  has  been  to  place  a  stationary  or  semi- 
stationary  device  in  the  water  which  recorded  the  flow  speed 
of  water  around  the  device.   The  second  way  has  been  to  re- 
cord the  set  and  drift  of  an  object  placed  in  the  current. 
Nearly  all  reported  investigations  in  which  statistical 
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procedures  were  used  appeared  to  have  based  their  analyses  on 
time-series  current-meter  data. 

2 .  Categories  of  Statistical  Studies 

A  statistical  approach  to  measured  current  velocities 
has  been  the  subject  or  subsection  of  reported  investigation- 
al efforts  in  Russia,  Canada,  France,  Norway  and  the  United 
States.   It  appears  that  statistical  procedures,  as  applied 
to  ocean  currents,  can  be  divided  into  two  categories;  those 
dealing  in  the  study  of  the  spectra  of  ocean  currents,  and 
those  dealing  with  the  distributional  aspect  of  current 
velocities.   Many  of  the  studies  have  been  concerned  with 
relating  spectral  properties  of  ocean  currents  to  internal 
waves,  planetary  waves,  and  theories  of  turbulence.   Others, 
to  a  great  extent,  ignore  the  spectral  properties  of  a  set 
of  data  and  are  concerned  with  the  distributional  properties 
of  velocity  components  and  speed  values. 

3.  Current-Speed  Statistical  Studies 

Russia  and  the  United  States  have  published  the 
majority  of  the  reports  concerned  with  the  statistical  dis- 
tribution and  analysis  of  ocean-current  velocities.   Webster 
[Ref.  1]  described  and  discussed  some  elementary  operations 
and  data  presentation  techniques  for  the  analysis  of  a  long 
time-series  of  current-meter  observations.   Belyayev  and 
Ozmidov  [Ref.  2] ,  using  data  measured  at  a  semipermanent 
buoy  station,  derived  empirical  distributions  of  the  current- 
velocity  components  at  ten  depths,  from  25  to  1200m.   It  was 
shown  that  these  distributions  differed  substantially  from 
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normal  below  the  pycnocline  and  that  the  third  and  fourth 
moments  of  the  distributions  changed  abruptly  in  the 
pycnocline . 

Paquette  [Ref.  3]  concentrated  his  efforts  on  the 
speeds  of  current-meter  records.   He  showed  that  in  nearly 
80?«  of  the  time-series  current-meter  records  checked,  when 
the  number  of  occurrences  is  plotted  against  the  logarithm 
of  the  speed  to  base  ten,  the  typically  skewed  distribution 
of  speed  becomes  Gaussian  at  the  0.05  level  of  significance 
or  greater.   On  long  time-series  records,  the  logarithmic 
standard  deviation  appeared  to  range  between  0.15  and  0.32. 
He  also  concluded  that  part  of  the  distortion  often  observed 
in  the  tails  of  the  probability  distribution  of  the  data 
was  presumably  due  to  inherent  current-meter  errors. 
Paquette's  results  concerning  the  distribution  of  the  data 
were  presented  on  cumulative  probability  plots  on  which  the 
empirical  distribution  of  the  data  was  plotted  along  with  a 
normal  distribution.   In  general  the  appearance  of  the  em- 
pirical distribution  was  quite  close  to  normal,  and  when 
subjected  to  a  Kolmogorov-Smirnov  (K-S)  test  for  normality, 
the  results  suggested  this  to  be  true.   However,  Paquette 
did  not  analyze  the  results  further  to  show  whether,  on  the 
average,  the  logarithmic  speed  distribution  produced  a  nearly 
normal  curve  or  some  distribution  close  to  normal  but  with 
systematic  deviations  from  normality. 

Paquette  briefly  introduced  and  analyzed  a  limited 
amount  of  dri f t -of -ship  (DOS)  data.   The  results  indicated 
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that  this  type  data  compared  favorably  with  the  majority  of 
the  current-meter  data.   However,  since  DOS  data  was  not 
extensively  analyzed,  the  results  were  not  firm  and  no  com- 
parison was  made  between  moored  current-meter  data  and  DOS 
data. 

C.   PURPOSE 

The  purpose  of  this  paper  is  to  investigate  more  closely 
the  normality  of  the  logarithmic-speed  distribution  as  it  is 
applied  to  ocean-current  records  and  to  analyze  more  exten- 
sively DOS  data.   It  will  be  shown  that  DOS  data  (after  a 
necessary  alteration)  and  current-meter  data  compare  quite 
favorably  and  that  the  logarithmic-speed  distributions  of 
both  types  of  data  can  be  considered  to  be  symmetrically 
dispersed  about  their  means  with  a  high  level  of  confidence. 
The  mean  value  of  a  group  of  logarithmic-speed  distributions 
is  shown  to  have  a  slight  systematic  deviation  probably  due 
to  transient  phenomena. 

In  order  to  extend  the  studies  of  current-meter  time- 
series  data  to  a  new  area  of  the  ocean  and  to  add  more  sam- 
ples to  the  data  base,  eleven  current-meter  records  from  the 
Coastal  Upwelling  Experiment  (CUE)  off  the  coast  of  Oregon 
were  also  analyzed. 

The  basic  approach  in  this  presentation  and  analysis  of 
results  has  been  two-fold.   One  was  to  record  and  plot,  at 
normalized  deviations  from  the  mean  (NDM)  of  each  cumulative 
probability  distribution,  the  difference  value  between  the 
logarithmic  distribution  and  a  log-normal  distribution.   If 
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an  empirical  logarithmic  distribution  were  truly  normal,  the 
differences  would  be  zero  and  a  straight  line  through  the 
zero  values  of  the  plot  would  occur.   The  second  procedure 
was  to  apply  known  statistical  measures  and  interrelation- 
ships to  parameters  derived  from  the  first  four  moments  of 
a  distribution.   Specifically  these  parameters  are  the  mean, 
standard  deviation,  coefficient  of  skewness,  and  coefficient 
of  kurtosis. 
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II.   THE  DATA 

Data  used  in  the  statistical  analysis  and  relationship 
investigations  reported  in  this  paper  came  from  four  sources. 

A.   TIME  SERIES  DATA 

1 .   Sources  of  the  Data 

One  source  was  moored  current-meter  data  recorded  by 
Woods  Hole  Oceanographic  Institution  [Refs.  4,  5,  6].   The 
second  source  was  moored  current-meter  data  recorded  by 
Paquette  and  designated  SCARF  1  through  SCARF  7.   These  first 
two  sources  include  29  of  the  43  time-series  records  used  by 
Paquette  [Ref .  3] . 

A  third  source  was  eleven  sets  of  moored  current- 
meter  records  furnished  by  Donald  Bishop  in  the  office  of 
the  Coastal  Upwelling  Experiment  (CUE)  at  the  University  of 
Washington.   This  data  was  recorded  by  Oregon  State  Univer- 
sity at  a  rate  of  one  speed  record  every  5  or  10  minutes. 
TABLE  I  gives  the  basic  statistical  summary  of  the  CUE  data. 
In  reference  to  TABLE  I,  the  sample  identification  number 
provides  an  indication  of  the  meter's  location  (these  loca- 
tions are  shown  in  Fig.  1).   V  is  the  arithmetic  mean  of 


the  speed.   Log  V  gives  the  mean  of  the  logarithmic-speed 
distribution.   a  is  the  arithmetic  standard  deviation  while 
a.  stands  for  the  standard  deviation  of  the  logarithmic-speed 
distribution.   APm  is  the  maximum  difference  observed  between 
the  cumulative  probability  of  the  logarithmic-speed  distri- 
bution and  the  cumulative  probability  of  a  log-normal 
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distribution  over  the  same  speed  range.   P  gives  the  cumula- 
tive probability  of  the  empirical  distribution  at  the  point 
where  APm  occurred. 

2 .   Independence  of  Observations 

Time-series  data  produces  questions  as  to  the  inde- 
pendence between  consecutive  data  points  since  most  statis- 
tical procedures  are  based  on  the  independency  of  individual 
data  samples.   Time-series  data  recorded  at  short  intervals 
are  usually  autocorrelated  which  lowers  the  degree  of  inde- 
pendence between  data  observations.   The  CUE  data  apparently 
are  highly  autocorrelated.   The  autocorrelation  coefficient 
drops  to  0.3  when  using  one  observation  every  four  hours. 
However,  the  effects  of  decimation  of  the  data  were  not  in- 
vestigated.  Paquette  [Ref.  3]  assumed  that  the  number  of 
effective  individual  data  points  (needed  for  goodness-of- 
fit  tests)  in  the  distributions  he  used  could  be  obtained 
by  dividing  the  total  number  of  observations  by  the  number 
of  lags  to  get  to  an  autocorrelation  coefficient  of  0.3.   He 
showed  that  decimination  to  this  degree  had  negligible  ef- 
fect on  the  mean  and  standard  deviation  of  the  distribution. 
It  will  be  assumed  that  the  same  procedure  can  be  followed 
with  the  CUE  data. 

B.   DRIFT-OF-SHIP  DATA 

1 .   Source  of  the  Data 

The  fourth  source  of  data,  and  one  not  extensively 
utilized  by  Paquette,  came  from  the  files  of  the  National 
Oceanographic  Data  Center  (NODC)  from  their  File  111-9.   This 
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file  is  an  extensive  set  of  comparisons  of  dead-reckoning 
positions  and  corresponding  fixes  covering  the  period  1904- 
1945.   The  difference  between  the  dead-reckoning  and  celes- 
tial or  electronic  fix  is  ascribed  to  a  current  which  is 
presumed  constant  over  the  hours  and  the  many  tens  of  miles 
between  fixes.   NODC  furnished  computer-generated  printouts 
which  included  all  information  for  Marsden  Squares  (MS) 
114,  115,  116,  149,  150  and  151.   Their  locations  are  shown 
in  Fig.  2.   Selected  data  from  these  printouts  were  used  in 
this  analysis.   Also  shown  in  Fig.  2  were  the  basic  loca- 
tions of  the  Woods  Hole  current  meters  whose  data  were  used 
both  by  Paquette  and  in  this  thesis.   The  DOS  data  was  re- 
ported by  five-degree  quadrants  within  each  ten-degree 
Marsden  Square,  Fig.  3,  and  then  by  month,  general  current 
direction,  and  speed  interval  within  each  quadrant,  Fig.  4. 
2 .   Independence  of  Observations 

DOS  data  has  been  computed  and  reported  by  an  un- 
countable number  of  people.   The  time  of  day  the  reports 
were  made,  the  types  of  ships  involved,  the  location  of  each 
ship,  the  wind  and  weather  conditions  were  all  unknown  fac- 
tors which  were  assumed  to  have  encompassed  all  possibilities 
over  the  41  year  reporting  period.   It  is  known  that  DOS  data 
was  not  recorded  when  the  reported  wind  speed  exceeded 
Beaufort  7  or  seas  exceeded  3.3m.   With  all  these  factors  in 
mind,  it  was  assumed  the  DOS  data  represented  basically  ran- 
dom independent  samples  in  the  areas  from  which  information 
was  reported.   This  did  not  exclude  the  possibility  that 
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peculiarities  in  the  speed  classes  may  exist  for  various 
reasons  such  as  errors  in  grouping  the  data,  bias  factors 
on  the  part  of  the  reporting  navigators,  or  the  kind  of 
space  and  time  averaging  involved. 
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III.   COMPUTER  PROGRAMS 

All  the  statistical  parameters  generated  from  the  data 
used  in  this  paper  were  obtained  using  computer  programs  on 
the  Naval  Postgraduate  School  IBM  360/67  digital  computer. 
Table  II  provides  a  summary  of  the  major  programs  utilized. 
Minor  programs  were  written  by  the'  author  to  perform  specific 
tasks  throughout  the  course  of  the  investigation  but  these 
did  not  compute  statistical  parameters. 

Program  HISTG  classifies  current-speed  data  into  class 
intervals  and  plots  the  resulting  histogram  on  the  line 
printer.   This  program  was  used  to  generate  statistics  on 
the  OSU  current-meter  data,  which  was  received  on  tape  as 
individual  records,  and  on  data  sets  keypunched  on  to  com- 
puter cards.   CUDIS  MOD3  and  CURST2  accept  data  in  histogram 
form  grouped  both  in  even  and  uneven  intervals.   CUDIS  MOD3 
computes  statistical  information  based  on  the  assumption 
that  the  number  of  counts  in  each  speed-class  interval  are 
concentrated  at  the  center  of  the  interval,  and  produces  a 
cumulative  log-normal  distribution  and  plots  it  on  a  prob- 
ability-paper scale.   CURST2  computes  statistical  information 
based  on  the  assumption  that  the  number  of  counts  in  each 
speed-class  interval  are  evenly  distributed  across  the  width 
of  the  interval.   It  does  not  produce  a  plot.   Besides  the 
information  provided  in  Table  II,  CURST2  computes  the  third 
and  fourth  moments  of  a  distribution  and  the  coefficients  of 
skewness  and  kurtosis.   These  arc  not  generated  by  CUUIS  MOD3, 
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IV.   MOORED  CURRENT -METER  DATA 

A.   ANALYSIS  APPROACH  USED 

Paquette  [Ref.  3]  concluded  that  the  current-speed  dis- 
tributions were  log-normal  at  a  level-of -signif icance  of 
0.05  or  greater  by  testing  each  of  the  43  series  studied  with 
the  K-S  statistic.   He  used  the  mean  and  standard  deviation 
obtained  from  the  data  as  estimates  of  these  parameters  for 
the  parent  population.   However  the  K-S  statistic  Paquette 
used  assumes  the  parameters  of  the  parent  population  are  not 
estimated  from  the  data.   According  to  Lilliefors  [Ref.  7] , 
when  the  parameters  of  the  parent  distribution  are  estimated 
from  the  data,  the  probability  of  a  type  I  error  will  be 
significantly  smaller  than  as  given  by  tables  of  the  K-S 
statistic.   Lilliefors  provides  a  new  table  for  the  critical 
values  of  the  deviation  for  several  useful  a  values.   The 
values  used  to  construct  Fig.  5  were  obtained  from  this 
table.   The  effective  number  of  observations  is  along  the 
abscissa  with  the  maximum  permissible  deviation  values 
plotted  on  the  ordinate.   Thus  the  results  obtained  by 
Paquette  are  conservative  in  that  his  results  are  at  a  higher 
level-of -signif icance  than  they  should  be. 

All  current-speed  data  used  by  Paquette  [Ref.  3]  and  in 
this  thesis  were  generally  is  histogram  form.   It  is  realized 
the  K-S  test  was  derived  for  ungrouped  data  and  that  its 
behavior  is  less  well  understood  when  using  grouped  data. 
Current-meter  data  were  grouped  in  only  one  cm/sec  intervals 
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and  appeared  more  or  less  as  continuous  data.   However,  DOS 
data  was  highly  grouped  and  less  acceptable  for  application 
of  the  K-S  statistic.   Therefore,  it  was  assumed  that  the 
DOS  data  would  give  a  larger  maximum  difference  between 
cumulative  distributions  than  ungrouped  data  (an  assumption 
which  seemed  reasonable) ,  and  that  the  K-S  test  would  give 
a  reasonable  result  that  was  somewhat  liberal  (reject  more 
than  it  would  if  the  data  were  not  grouped) .   More  work  on 
this  subject  is  needed  but  is  left  for  future  studies. 
However,  if  the  current-speed  distributions  are  in 
general  log-normal,  one  might  assume  that  the  normalized- 
logarithmic  distributions  derived  from  the  many  time  series 
ought  to  be  comparable  and  members  of  an  ensemble  of  dis- 
tributions.  Then  one  may  test  the  fit  by  examining  the 
deviations  of  the  cumulative  distribution  function  (C.D.F.) 
of  the  data  from  the  cumulative  log-normal  distribution  at 
a  number  of  values  of  the  normalized  deviation  of  the  log- 
arithmic speed,  (•— 2J2 I — ELa_J  t    where  Log  V  is  the  logarithm 

aL 


to  the  base  ten  of  any  speed  value  V,  Log  V  is  the  mean  of 
the  logarithmic-speed  distribution,  and  a.  is  the  standard 
deviation  of  this  distribution.   This  technique  has  the  ad- 
vantage of  examining  all  of  the  series  together,  looking  for 
an  overall  systematic  difference  from  the  ideal  and  looking 
at  the  distribution  of  the  difference  values  at  each  nor- 
malized deviation  point  selected. 

Besides  the  goodness -of  -  fit  to  the  normal  cumulative  dis- 
tribution curve,  the  coefficients  of  skewness  and  kurtosis 
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were  examined  as  they  relate  to  each  other  on  a  Pearson  dia- 
gram.  These  coefficients  also  might  be  expected  to  be  com- 
parable if  the  curves  are  similar.    However,  as  pointed 
out  by  Pearson  [Ref .  8] ,  different  distributions  can  have 
the  same  first  four  moments.   These  coefficients  apparently 
have  not  been  used  previously  in  studies  of  currents. 

B.   DEGREE  OF  NORMALITY  OF  TIME-SERIES  LOGARITHMIC-SPEED  DATA 
1 .   Data  Used  and  Presentation  Methods 

The  data  used  in  this  approach  was  part  of  the  same 
time-series  data  used  by  Paquette  and  included  all  the  SCARF 
data  and  all  the  Webster  and  Fofonoff  data,  29  time-series 
data  sets  in  all.   Figure  6  is  a  plot  of  the  difference  values 
(observed  minus  predicted  logarithmic  cumulative  probabilities) 
for  the  29  time-series  data  sets  at  nine  normalized  deviations 
from  the  mean  (NDM) .   Difference  values  are  noted  along  the 
absicssa  while  the  nine  NDM  values  selected  are  indicated 
along  the  ordinate.   Plus  and  minus  three  sigma  units  were 
used  as  the  limits  of  the  NDM  values  because  the  time-series 
records  did  not  provide  sufficient  values  for  analysis  be- 
yond these  points.   Bar  plots  of  the  difference  values  at 
each  NDM  are  given  showing  the  range  of  values  observed. 
A  smooth  curve  was  faired  through  the  mean  value  at  each 
NDM  considered. 

Table  III  provides  summary  statistics  of  the  data 
used  to  construct  Fig.  6.   Not  all  of  the  29  time  scries 
data  sets  extended  out  to  the  two  and  three  sigma  location. 
The  last  column  of  this  table  provides  the  results  of  a  K-S 
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goodness-of -f it  test  for  normality  of  the  difference  values 
at  each  NDM  assuming  a  mean  value  of  zero.   Under  the  hypoth- 
eses that  the  log-speed  transformation  produces  a  normal 
distribution  from  current-speed  records,  it  is  assumed  that 
difference  values  at  each  NDM  are  random  and  come  from  a 
nearly  normal  population  whose  mean  is  zero. 

It  is  readily  apparent  in  Fig.  6  that  the  range  of 
difference  values  includes   the  zero  value  in  all  instances. 
However,  the  distributions  of  the  difference  values  are  not 
in  general  symmetric  about  zero.   This  is  not  too  surprising 
since  any  subsample  drawn  from  a  parent  population  will  most 
likely  not  possess  the  same  mean  as  the  parent  population. 
A  smooth  curve  through  the  mean  values  at  each  NDM  shows  a 
systematic  "S"  shape  variation  from  the  log-normal  curve. 
2 .   Significance  of  Observed  Results 

In  order  to  determine  the  significance  of  the  "S" 
shape  variation  in  Fig.  6,  one  must  examine  some  of  the  sta- 
tistical values  provided  in  Table  III.   To  aid  in  this  exam- 
ination, Table  IV  is  given  which  shows  some  of  the  computations 
and  values  required  in  the  following  analysis.   Columns  1,  2, 
3,  6  and  7  of  Table  IV  are  repeated  from  Table  III.   Brooks 
and  Carruthers  [Ref.  9]  provide  computations  for  the  standard 
error  of  the  coefficient  of  skewness  of  any  series  of  N  ran- 
dom numbers  (p.  55),  and  the  standard  error  of  a  single  ob- 
servation from  a  sample  of  N  observations  (p.  40).   "t"  in 
Table  IV  is  the  value  for  a  "Student"  t -distribution ,  v  is 
the  degrees-of -freedom  for  that  distribution,  and  a  is  the 
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significance  levels  obtained  when  entering  "Student"  t-tables 
for  a  two-tailed  level-of -significance  test.   The  usage  of 
these  values  will  be  explained  later. 

a.   Symmetry  of  the  Data  at  Each  NDM 

It  is  noted  in  Table  IV  that  at  all  but  one  of 
the  nine  NDM's,  the  coefficients  of  skewness  of  the  individual 
sets  of  difference  values  is  less  than  one.   Brooks  and 
Carruthers  [Ref.  9]  point  out  that  any  set  of  N  random  numbers 
will  show  a  certain  amount  of  skewness  (p. 55),  however,  the 
absolute  value  of  the  coefficient  of  skewness  less  than  one 
indicates  data  only  moderately  skewed  (p.  56).   They  also 
specify  that  the  skewness  can  be  considered  real  only  when 
the  coefficient  of  skewness  exceeds  twice  the  value  of  the 
standard  error  (p.  55).   It  seems  only  logical  that  the 
larger  the  value  of  N,  the  better  the  confidence  in  these 
statements.   By  comparison  of  columns  three  and  five  in 
Table  IV,  it  is  seen  that  except  for  the  NDM  value  of  three 
sigma,  the  majority  of  the  coefficients  of  skewness  fall 
significantly  short  of  being  equal  to  twice  the  value  of  the 
standard  error.   Since  skewness  is  an  indication  of  the  sym- 
metry of  a  distribution  about  its  mean,  the  indication  from 
Table  IV  is  that  at  each  NDM  except  three  sigma,  the  dif- 
ference values  are  basically  symmetrical  about  their  means. 

Since  the  data  at  the  NDM  values  are  basically 
symmetrical  about  their  means  and  since  a  review  of  the 
histograms  of  the  data  show  in  general  normal  type  distri- 
butions, except  at  three  sigma  where  the  data  exhibits  a 
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definite  "J"  shaped  distribution  to  the  left,  a  test  was 
made  to  determine  to  what  degree  the  data  were  normal  about 
the  hypothesized  mean  of  zero.   A  K-S  test  was  conducted 
with  the  standard  deviation  estimated  from  the  data.   The 
results  are  given  in  the  last  column  of  Table  III.   The 
figures  given  are  the  level  of  significance  or  a  values  of 
the  test  obtained  from  Fig.  5.   The  amount  of  reduction  in 
the  a  value  due  to  estimating  only  the  standard  deviation 
from  the  data  was  not  known,  but  it  was  assumed  to  be  sig- 
nificant and  therefore  Lilliefor's  results  were  used.   It 
appears  the  normal  hypothesis  could  be  rejected  on  the  basis 
of  the  evidence  from  these  data,  at  a  significance  level  of 
.008  or  below. 

b.   Significance  of  the  Deviations  of  the  Means 
As  stated  previously,  it  is  a  known  fact  that 
any  subsample  from  a  large  population  will  most  likely  not 
have  the  same  mean  as  the  parent  population.   Brooks  and 
Carruthers  [Ref.  9,  p.  65]  demonstrate  a  method  of  testing 
whether  a  mean  M  from  a  subsample  differs  significantly 
from  a  postulated  population  mean  M' .   The  test  can  be  made 
using  the  well-known  "Student"  t-distribution  where  the 
t-value  is  computed  by: 

m    (M  -  M') 
o//N 

a  is  the  estimate  of  the  population  standard  deviation 
derived  from  the  sample,  and  the  distribution  of  "t"  is  as- 
sociated with  N-l  degrecs-of-freedom.   This  fact  can  be  used 
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to  test  the  significance  of  the  deviation  of  the  mean  from 
zero  at  the  individual  NDM's.   At  each  of  the  NDM's  the 
value  of  M1  is  zero,  and  the  values  for  the  other  computa- 
tions to  derive  "t"  are  given  in  Table  IV.   The  test  hypoth- 
esis is  that  the  subsample  mean  does  not  differ  significantly 
from  zero. 

Figure  7  is  derived  from  the  t-tables  and  can  be 
used  for  the  t-tests  in  this  thesis.   The  "Student"  t-value 
is  given  along  the  abscissa  with  level  of  significance  on 
the  ordinate.   The  curves  are  for  different  values  of 
degrees -of -freedom.   Enter  with  the  t-value  and  degrees-of- 
freedom  and  read  off  a  on  the  ordinate. 

We  can  infer  therefore  from  the  results  in  Table 

IV  that  the  departure  from  the  mean  of  zero  at  each  NDM  value 

is  probably  significant  except  possibly  at  NDM  values  of  -2 

and  0.5.   These  latter  two  means  are  near  zero  anyway  and 

are  near  crossing  points  in  the  curve.   Therefore  the  "S" 

shaped  curve  in  Fig.  6  is  indeed  most  probably  real,  and  not 

a  sampling  artifact. 

c.   An  Engineering  Viewpoint  of  the  Significance  of 
Results 

Perhaps  more  important  than  significance  in 
terms  of  probability  is  the  utility  of  this  information  from 
an  engineering  viewpoint.   If  one  is  concerned  with  the  maxi- 
mum current  to  be  expected  on  an  object  being  placed  in  the 
ocean,  he  is  concerned  with  the  high  speed  tails  of  the  dis- 
tribution being  correct.   At  a  NDM  of  two  sigma  the  normal 
probability  ought  to  be  0.9772.   The  maximum  difference  value 
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observed  from  the  data  was  0.023.   This  gives  an  error, 

1  -   9  772 
( qy^ -]  ,  that  is  not  quite  as  great  as  one  times  the 

residual  probability  remaining.   At  the  NDM  of  three  sigma 

this  error  increases  to  over  sixteen  times  the  residual 

probability  remaining.   Therefore,  the  data  at  the  three 

sigma  value  is  unreliable  for  use,  as  will  be  shown  below. 

d.   General  Summary  and  Possible  Errors 

It  can  be  said  that  the  "S"  shape  curve  in  Fig.  6 
is  most  probably  real  as  shown  and  not  due  to  chance.   In 
general  the  type  of  curve  represented  by  the  smooth  curve  in 
Fig.  6  is  one  which  contains  slightly  fewer  data  points  be- 
low the  mean  and  slightly  more  data  points  above  the  mean 
than  would  be  expected  of  normally  distributed  data.   To 
put  Fig.  6  into  a  better  perspective  to  indicate  just  how 
much  deviation  is  being  shown,  a  more  familiar  representa- 
tion of  the  CDF's  of  the  curves  in  question  is  shown  in  Fig. 
8.   As  can  be  seen,  the  maximum  deviations  are  small  and  the 
CDF's  of  the  two  distributions  are  almost  identical. 

Perhaps  one  could  argue  that  the  systematic 
deviation  in  Fig.  6  could  be  caused  by  measurement  errors  or 
errors  in  treating  the  data.   This  could  possibly  be  true  if 
it  were  not  for  the  fact  that  the  data  used  for  Fig.  6  came 
from  two  separate  sources  and  that  a  similar  plot  using  only 
the  eleven  sets  of  CUE  data  (which  is  yet  a  third  source  and 
one  which  used  a  different  type  of  current  meter,  the 
Aanderaa,  than  the  SCARF  and  Webster  and  Fofonoff  data) 
showed  the  same  general  variation.   Therefore  the  reason  for 
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this  systematic  variation  is  not  clear.   Some  possible  causes 
could  be  the  occurrence  of  events  such  as  storms  which  may 
produce  anomalous  water  velocities  for  a  substantial  frac- 
tion of  the  recording  period,  mooring  transits,  and  excessive 
oscillation  of  the  buoy  during  the  recording  period.   All  of 
these  could  conceivably  produce  the  type  of  effect  noticed. 
e.   Limit  of  Usefulness 

It  was  shown  that  at  all  NDM's,  the  difference 
values  obtained  were  basically  symmetrical  about  their  mean 
and  the  histograms  indicated  possibilities  of  normality  except 
at  the  three  sigma  location.   The  distribution  here  was  "J" 
shaped  trailing  off  to  the  left  or  towards  higher  negative 
difference  values.   This  says  that  at  a  NDM  of  three  there 
is  in  general  fewer  observations  than  observed  in  a  normal 
curve  of  the  data.   This  is  to  be  expected  since  the  current 
meters  have  a  tendency  to  record  fewer  than  observed  higher 
speeds  due  to  a  coalescing  of  speed  dots  on  the  recording 
film.   Therefore  it  appears  the  NDM  value  of  three  is  beyond 
the  usefulness  of  the  current  meter  to  provide  satisfactory 
data  for  analysis.   Although  current  meters  do  have  problems 
with  stalling  of  the  rotor  at  the  low-speed  end  of  the  scale, 
the  data  at  a  NDM  value  of  minus  three  does  not  indicate 
any  problems,  so  it  is  assumed  this  value  is  within  the  use- 
ful range  of  the  current  meters  used.   Since  data  obtained 
for  very  high  and  very  low  speeds  is  suspect,  no  explicit 
attention  has  been  paid  to  this  in  the  process  of  statistical 
estimation.   New  "robust"  procedures  that  account  for  such 
data  difficulties  are  described  by  Andrews  [Ref.  10]. 
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C.   PEARSON  DIAGRAM 

1.  Presentation  Method 

Another  means  of  determining  the  type  of  distribution 
data  may  represent  is  by  use  of  a  Pearson  diagram.   This  is 
a  diagram  on  which  is  plotted  the  square  of  the  coefficient 
of  skewness,  g, ,  versus  the  coefficient  of  kurtosis,  $?. 
Pearson  [Ref.  11]  showed  that  different  regions  of  the  31  , 
3„  space  correspond  to  several  different  theoretical  distri- 
bution curves.   Table  V  provides  the  B-.  and  $_  values  for  the 
logarithmic  time-series  data  previously  considered  plus  these 
values  for  the  CUE  data  which  will  now  be  included  for  anal- 
ysis.  Figure  9  is  a  Pearson  diagram  on  which  the  3-,,  &7 
values  from  Table  V  are  plotted. 

2 .  Indication  of  Data  Errors 

The  plotted  points  in  Fig.  9  appear  to  show  an  exces- 
sive spread.   However,  further  investigation  into  comments 
concerning  the  recording  of  the  SCARF  and  Webster  and  Fofonoff 
data  showed  that  about  631  of  the  data  sets  having  a  62  value 
of  4.5  or  below  experienced  marked  quantization  in  speeds, 
higher  than  normal  speeds  due  to  mooring  transits,  or  exces- 
sive buoy  oscillations  while  in  place.   About  671  of  the  dis- 
tributions with  values  of  6,  equal  to  1.0  or  greater  showed 
these  same  characteristics.   Only  one  of  the  CUE  data  records 
plotted  in  the  region  just  discussed  but  no  detailed  informa- 
tion on  those  records  was  readily  available. 

One  of  the  errors  mentioned  above,  high  speeds  due 
to  mooring  transits,  does  add  a  quantity  of  high  speed  values 
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to  a  speed  record.   Transient  phenomena  such  as  storms  and 
influences  from  high  speed  current  regimes  can  also  cause 
an  excess  of  high  speed  current  values.   These  high  values 
cause  the  distribution  to  be  more  positively  skewed  than 
otherwise  would  be  expected.   These  factors  could  distort 
a  speed  record  significantly  if  the  total  recording  time  is 
small.   Many  of  the  records  with  large  31 ,  3?  values  were 
also  relatively  short  time  duration  (less  than  a  day) . 
Pearson  [Ref.  8,  p.  285]  discusses  this  problem  of  long  tails 
on  a  distribution  and  shows  that  the  contribution  to  moments 
from  the  tails  significantly  increases  as  the  moment  in- 
creases.  For  instance,  the  contribution  to  the  fourth  moment 
from  areas  in  the  outer  .001  part  of  the  tail,  of  a  distribu- 
tion with  B1 ,  3~  values  of  2.79  and  9.01  respectively,  is 
about  411.   This  contribution  increases  to  74.21  if  the  outer 
.01  part  of  the  tail  is  considered.   Since  the  (3..  and  B~ 
values  depend  on  the  second,  third  and  fourth  moments,  er- 
roneous speed  values  which  extend  the  tails  of  a  distribution 
will  have  significant  effect  on  where  a  distribution  plots 
on  a  Pearson  diagram. 

It  would  be  impossible,  without  a  highly  detailed 
study,  to  ascertain  to  what  extent  the  three  errors  mentioned 
influenced  the  data,  but  is  fairly  obvious  that  some  had 
significant  influences  on  the  high  values  of  31  and  B2   ob- 
served in  Fig.  9.   None  of  these  problems  were  noted  in  data 
which  exhibited  B,,  B2  values  smaller  than  the  values  given 
above . 
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3.   Summary  of  Results 

A  normal  curve  will  generate  3-,  and  37  values  of  0.0 
and  3.0  respectively.   A  grouping  of  points  about  the  (0,3) 
value  would  be  an  indication  the  log  speed  transformation 
was  a  good  fit.   Except  for  the  points  previously  mentioned, 
the  majority  of  the  logarithmic  time-series  data  plots 
closely  grouped  about  the  (0,3)  point.   Again  the  indication 
is  that  the  log-normal  approach  to  current-meter  time-series 
data  produces  a  near-normal  distribution.   This  diagram  will 
be  used  later  to  compare  with  the  DOS  data. 

D'Agostino  and  Pearson  [Ref.  12]  and  Bowman  [Ref.  13] 
have  published  recent  articles  on  the  use  of  the  3-.  ,  37 
statistics  in  testing  normality  of  a  data  set.   Their  pro- 
cedures were  not  used  in  this  thesis,  but  are  referenced  for 
future  use. 


34 


V.   DRIFT-OF-SHIP  DATA 

The  attention  of  the  analysis  effort  then  shifted  to  the 
DOS  data.   The  number  of  locations  where  current  meters  have 
recorded  measurements  is  small  in  comparison  to  the  total 
area  of  the  ocean.   However,  DOS  information  is  available 
over  a  large  percentage  of  both  the  Atlantic  and  Pacific 
Ocean.   If  a  suitable  distribution  for  these  speeds  could  be 
found,  a  method  would  be  available  for  estimating  the  speeds 
probabilistically.   This  requires  also  some  way  of  estimating 
the  second  moment,  a  quantity  which  is  not  charted  on  the 
current  charts.   Since  DOS  data  are  somewhat  different  and 
probably  more  distorted  than  current-meter  data,  it  is  de- 
sirable to  use  current-meter  data  to  help  correct  the 
distortions . 

It  is  recognized  that  ocean  currents  usualy  decrease  with 
depth.   This  is  an  important  part  of  the  current  prediction 
problem  to  engineers.   The  present  study  does  not  enter  into 
this  problem. 

A.   IRREGULARITIES  IN  DOS  DATA 

DOS  data  used  in  this  study  appear  to  suffer  some  ir- 
regularities at  both  ends  of  the  speed  spectrum.   This  is 
discussed  to  some  degree  by  Paquette.   The  four  knot  speed 
class  (see  Fig.  3)  includes  all  accepted  speeds  four  knots 
and  greater.   This  has  the  effect  of  requiring  one  to  place 
a  limit  on  the  upper  class  interval  in  order  to  proceed 
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with  distributional  investigations.   Herein  enters  one  pos- 
sibility for  error.   Although  the  total  number  of  occurrences 
in  this  speed  class  is  small  in  comparison  to  the  total  count, 
the  generated  errors  could  be  significant.   A  speed  of  4.5 
knots  was  chosen  as  the  top  limit  for  this  analysis. 

The  lowest  class  interval  also  presents  a  problem.   Al- 
though described  as  "calm,"  its  upper  boundary  is  slightly 
less  than  0.1  knot.   It  is  assumed  that  true  zeros  do  not 
exist  and  the  lower  boundary  is  placed  at  0.01  knot.   Small 
changes  in  this  arbitrary  choice  have  considerable  effect 
when  the  logarithmic  transformation  is  made.   Furthermore, 
after  transformation  the  class  interval  is  too  large  to 
properly  represent  the  tail  of  the  curve.   A  pictorially 
nicer  technique  would  be  to  distribute  the  counts  in  this 
interval  into  several  intervals  according  to  a  rule  consis- 
tent with  the  log-normal  curve.   This  seemed  like  too  much 
tampering  with  the  data  and  the  above  simple  course  was 
followed. 

It  is  apparent  that  wind  effect  included  in  the  recorded 
speeds  is  impossible  to  ascertain.   It  could  add  to  or  re- 
duce from  the  true  current  speed.   This  would  vary  with  wind 
speed,  direction  of  ship  travel  relative  to  the  wind,  and 
from  ship  to  ship.   No  wind  correction  factors  were  entered. 
As  was  mentioned,  data  taken  when  the  winds  were  above  Force 
7  are  excluded  from  the  data.   While  this  reduces  the'  effect 
of  excessive  wind-drift  of  the  ship,  it  also  eliminates  the 
higher  speeds  of  wind-driven  current. 
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It  is  also  to  be  noted  that  the  DOS  data  are  averages  in 
time  and  space.   This  averaging  will  smooth  sharp  high  and 
low  peaks  and  will  reduce  the  apparent  numbers,  especially 
of  the  high  speeds. 

Human  error  certainly  enters  into  the  results.   In  most 
cases  one  expects  this  to  be  Gaussian  error  and  to  have  little 
effect  except  to  increase  the  standard  deviation  slightly. 
However,  there  appears  to  be  a  significant  bias  at  the  low- 
speed  end  of  the  curve  which  will  be  discussed  in  the  next 
section. 

B.   A  NECESSARY  DATA  ALTERATION 

There  is  an  apparent  anomaly  in  the  "Calms"  and  0.1  knot 
speed  classes.   It  is  believed  that  this  is  an  artifact 
arising  from  a  natural  but  unjustified  pride  in  precision 
of  celestial  and  electronic  fixes.   There  is  nearly  always 
some  scatter  among  the  navigational  lines  of  position.   It 
would  be  natural  for  the  navigator  to  be  biased  toward  those 
which  agreed  with  the  dead-reckoning  position.   So  it  would 
not  be  surprising  to  find  more  recorded  "calms"  than  actually 
occurred. 

1 .   Alteration  Indicated  by  e.c.d.f. 

This  appeared  to  be  an  explanation  for  the  results 
observed  when  the  empirical  cumulative  distribution  function 
(e.c.d.f.)  of  the  DOS  data  is  plotted.   The  e.c.d.f.  is  a 
plot  of  the  i-th  ordered  value  as  ordinate  against  (i-^)/N 
as  abscissa.   N  is  the  total  data  count.   In  one-dimensional 
samples,  it  provides  an  exhaustive  representation  of  the  data 
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under  the  following  broad  assumptions:   (i)  that  the  order 
of  the  observations  is  immaterial,  (ii)  that  there  is  no 
classification  of  the  observations,  based  on  extraneous  con- 
siderations, which  one  wishes  to  employ;  and  (iii)  if  the 
sample  is  non-random,  then  appropriate  weights  are  specified. 
Wilk  and  Gnanadesikan  [Ref.  14]  discuss  the  significant  ad- 
vantages of  using  the  e.c.d.f.  in  a  descriptive  test  of  data. 
It  is  pointed  out  by  them  that  the  e.c.d.f.  "is  a  robust 
carrier  of  information  on  location,  spread  and  shape,  and  an 
effective  indicator  of  peculiarities"  (p.  2).   Figure  10, 
included  as  an  example  of  the  type  of  plot  one  might  expect 
to  see  from  a  log-normal  data  series  exhibiting  no  readily 
apparent  data  irregularities,  is  the  e.c.d.f.  for  Webster 
and  Fofonoff  measurement  No.  1012  (WF  1012).   One  sees  bas- 
ically a  smooth  flow  of  the  data  from  one  end  to  the  other. 
Plotted  in  Figure  11  is  the  e.c.d.f.  of  MS  115,  quadrant  1, 
month  10.   The  data  flow  appears  smooth  in  the  upper  60%  of 
the  observations,  but  some  peculiarity  is  evident  in  the 
lower  end  of  the  data.   It  was  felt  two  basic  reasons  caused 
this  to  occur.   One  is  the  lack  of  resolution  of  speeds  in 
the  region  near  zero.   However,  despite  this  factor,  it  ap- 
pears likely  the  main  reason  is  that  too  many  observations 
occur  in  the  "calm"  class.   If  some  were  transferred  to  the 
0.1  knot  speed  class,  the  e.c.d.f.  plot  would  appear  smoother, 

Plots  of  the  e.c.d.f.  of  other  sets  of  DOS  data  showed 
similar  traits  to  varying  degrees.  It  was  not  feasible  to 
investigate  this  characteristic  of  the  data  more  thoroughly 
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at  this  time.   Therefore,  a  partial  correction  was  made  by 
arbitrarily  shifting  nine-tenths  of  the  counts  in  the  "calm" 
interval  into  the  0.1  knot  interval.   Figure  12  is  the  e.c.d.f. 
for  the  data  of  Fig.  11  altered  in  this  way.   It  shows  a  much 
smoother  data  fit  and  one  which  generally  resembles  the 
current-meter  data  of  Fig.  10,  except  for  the  inflection  at 
the  lower  end  which  may  be  obscured  by  the  coarseness  of  the 
subdivision  into  intervals. 

2 .   Alteration  Indicated  by  Probability  Density  Plot 
Visual  examination  of  the  log-normal  probability- 
paper  plots  for  the  cases  studied  showed  this  arbitrary 
change  to  be  at  least  approximately  correct.   As  an  example, 
Fig.  13  shows  two  separate  logarithmic  probability  density 
plots  for  MS  116-4-6  and  MS  116-3-9.   Each  include  the  re- 
sults of  one  unchanged  data  base  and  one  corrected  as  dis- 
cussed above.   The  effect  of  the  nine-tenths  shift  is  very 
dramatic  and  produces  a  more  normal  appearing  probability 
density  plot.   No  further  investigation  was  done  to  determine 
whether  the  nine-tenths  shift  was  an  optimum  alteration. 

C.   DOS  STATISTICAL  PARAMETERS 

1 .   Data  Used  and  Presentation  Methods 

Fifty  different  months  of  DOS  data  were  selected  for 
analysis  from  areas  generally  off  the  northeast  coast  of  the 
United  States.   Each  set  of  data  was  distributed  over  a  five- 
degree  square.   The  squares  and  months  were  selected  to  pro- 
vide data  from  within  and  outside  areas  of  expected  high  current 
velocities  at  different  times  of  the  year  due  to  major  current 
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systems.  These  fifty  data  sets  were  then  altered  by  the 
nine-tenths  data  shift  and  then  analyzed  with  the  aid  of 
the  computer  programs  CUDIS  M0D3  and  CURST2. 

Table  VI  provides  the  statistical  summary  of  the 
data  generated  by  these  programs.   Columns  one  and  two  iden- 
tify the  data  sets  and  indicate  the  number  of  speed  class 
intervals  into  which  the  data  is  divided  as  well  as  the  total 
speed  observations  per  data  set.   The  arithmetic  mean  (V)  and 
standard  deviation  (a)  are  given  in  columns  three  and  four 
respectively.   Columns  five  through  eight  provide  the  log- 
arithmic statistics  for  each  data  set  and  include  in  the 


order  given,  mean  (Log  V),  standard  deviation  (a, ) ,  coeffi- 
cient of  skewness  and  coefficient  of  kurtosis.   As  defined 
before,  APm  is  the  maximum  difference  between  the  logarithmic- 
speed  cumulative  probability  and  a  log-normal  cumulative 
probability,  while  P  is  the  value  of  the  logarithmic-speed 
cumulative  probability  where  APm  occurs.   For  comparison, 
these  values  are  given  for  the  curve  that  existed  prior  to 
the  nine-tenths  alteration. 

2 .   Comparison  With  Current-Meter  Data 

Several  results  become  readily  apparent  from  Table 
VI  when  compared  with  Paquette's  work  [Ref.  3]  and  Table  I 
of  this  report.  The  logarithmic  standard  deviation  appears 
to  be  grouped  into  narrow  limits  between  0.24  and  0.36. 
This  measurement  for  the  current-meter  data  ranged  between 
0.10  and  0.538.  This  is  attributed  to  the  grouping  of  the 
DOS  data  and  the  limit  placed  on  the  high-speed  end  of  the 
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DOS  data.   All  the  DOS  data  sets  except  one  show  a  small  to 
moderate  negative  skewness.   As  can  be  seen  from  Table  V 
this  was  generally  true  for  the  current-meter  data,  however 
some  exhibited  skewness  coefficients  that  were  positive  and 
some  that  were  negative  but  greater  than  minus  one.   The 
coefficients  of  kurtosis  for  the  DOS  data  ranged  between 
2.49  and  5.358  while  for  logarithmic  current-meter  time- 
series  data  they  ranged  in  value  from  2.46  to  12.91.   The 
two  sets  of  data  look  generally  alike  except  the  current- 
meter  data  is  considerably  more  variable.   Some  deviations 
in  the  current-meter  results  are  so  extreme  that  peculiar- 
ities in  the  data  are  suggested. 

3.   K-S  Test  of  Normality  of  Each  Data  Set 

In  order  to  produce  a  numerical  measure  of  closeness 
of  fit  of  the  logarithmic  current-meter  distributions  to  the 
log-normal,  Paquette  [Ref.  3]  applied  the  K-S  statistic  as 
previously  mentioned.   The  K-S  statistic  uses  the  maximum 
deviation  in  absolute  value  between  the  empirical  and  theo- 
retical cumulative  distribution  (APm  in  Table  VI)  and  the 
effective  number  of  observations  (number  of  independent  ob- 
servations) to  derive  a  level -of -significance  for  the  fit. 

The  K-S  test  was  applied  to  the  DOS  data  in  Table  VI 
using  Lilliefors'  results.   The  total  number  of  observations 
in  each  data  set  was  used  to  enter  Fig.  5.   At  an  a  level 
of  0.05,  the  maximum  permissible  deviation  was  obtained  from 
the  ordinate.   If  this  value  was  greater  than  APm  in  Table 
VI,  the  normal  hypothesis  could  not  be  rejected  on  the  basis 
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of  the  evidence  from  this  data  at  the  significance  level  of 
0.05.   Ninety  percent  of  the  DOS  data  sets  passed  the  K-S 
test  with  a  confidence  level  of  0.05  or  greater.   The  same 
procedure  was  used  to  test  the  data  sets  prior  to  the  nine- 
tenths  alteration.   Nearly  881  of  the  unaltered  DOS  data  sets 
failed  the  K-S  test  at  the  0.05  level  of  significance.   It 
therefore  appears  that  the  nine-tenths  data  alteration  in  the 
first  speed  class  produces  a  much  more  normal  logarithmic- 
speed  distribution. 

D.   DEGREE  OF  NORMALITY  OF  DOS  LOGARITHMIC-SPEED  DATA 
1 .   Data  Presentation 

The  same  procedures  as  used  with  the  current-meter 
data  were  applied  to  the  DOS  data.   A  bar  plot  of  the  dif- 
ference values  between  the  observed  and  predicted  cumulative 
distributions  at  designated  NDM  values  is  presented  in  Fig. 
14.   The  striking  resemblance  in  shape  to  Fig.  6  is  readily 
apparent.   Table  VII  provides  a  summary  of  the  statistics 
from  the  data  used  in  constructing  Fig.  14.   This  table  cor- 
responds to  Table  III.   The  distribution  of  the  difference 
values  at  the  individual  NDM's  is  not  entirely  symmetric 
about  zero,  and  a  curve  smoothed  through  the  mean  value  at 
each  NDM  shows  a  slight  "S"  shaped  systematic  variation  from 
the  normal  curve. 

The  distribution  represented  by  the  smooth  curve 
through  the  mean  values  of  the  DOS  difference  values  in  Fig. 
14  is  nearly  the  same  as  the  current  meter  data  except  it  is 
more  symmetrical  in  shape   than  the  curve  in  Fig.  6. 
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2 .   Symmetry  of  the  Data  at  Each  NDM  and  Overall  Limits 
of  Data  Usefulness 

Table  VIII  is  like  Table  IV  except  that  the  figures 
come  from  the  DOS  data  under  consideration.   A  review  of  the 
coefficients  of  skewness  in  Table  VIII  show  that  except  at 
the  three  sigma  location,  all  values  are  significantly  less 
than  unity,  indicating  only  moderate  skewness.   All  but  two 
of  the  coefficients  of  skewness  are  significantly  less  than 
twice  the  standard  error,  indicating  that  their  skewness  is 
probably  not  real  and  the  data  are  nearly  symmetric  about 
their  means.   NDM  values  of  minus  one  and  three  showed  signs 
of  real  skewness  in  the  distribution  of  difference  values. 

A  visual  survey  of  the  histograms  of  the  difference 
values  at  each  NDM  point  revealed  basically  normal  looking 
distributions  except  at  the  three-sigma  location,  where  the 
distribution    resembled  a  "J"  shaped  curve  trailing  off  to 
the  left  toward  higher  negative  values.   This  indicated  fewer 
observations  were  observed  in  this  area  than  expected.   This 
result  should  be  expected  since  restrictions  based  on  wind 
force  and  sea  state  at  the  higher-speed  end  of  the  data 
probably  eliminated  many  of  these  higher-current  values.   A 
similar  phenomenon  was  observed  with  the  current-meter  data. 
It  is  interesting  to  note  that  because  of  the  restrictions 
on  the  high  speed  ends  of  the  current  values  both  current- 
meter  and  DOS  records  showed  a  similar  exponential  distribu- 
tion at  the  three  sigma  point  with  nearly  identical  values 
for  the  coefficient  of  skewness  and  coefficient  of  kurtosis. 
Unless  some  means  is  derived  to  correct  for  the  lost  data  in 
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the  high  speed  tail  of  the  DOS  distributions  such  as  extend- 
ing the  upper  limit  of  the  4.0  knot  speed  class,  the  three 
sigma  point,  as  with  current-meters,  appears  to  limit  the 
useful  range  of  DOS  data. 

A  K-S  test  was  conducted  at  each  NDM  to  test  the 
hypothesis  that  the  data  at  these  points  were  normal  about 
the  theoretical  mean  of  zero.   The  results  are  shown  in  the 
right  hand  column  of  Table  VII.   One  sees  that  there  is 
little  or  no  likelihood  that  the  data  could  be  normal  about 
zero.   This  corresponds  to  the  results  obtained  from  current- 
meter  data  as  shown  in  Table  III. 

3 .   Significance  of  Deviations  of  the  Means 

A  "Student"  t-test  was  made  to  test  the  hypothesis 
that  at  each  NDM,  the  deviation  of  the  data  mean  from  the 
theoretical  mean  of  zero  is  not  significant.   The  computations 
and  results  of  this  test  are  given  in  Table  VIII.   Only  two 
locations  passed  this  test  with  a  level  of  significance 
greater  than  0.05.   These  two  points,  -0.5  sigma  and  one 
sigma,  are  near  crossing  points  of  the  smooth  curve  in  Fig. 
14  and  therefore  concurrence  with  the  hypothesis  would  be 
high  at  those  points.   As  was  found  in  Fig.  6,  the  deviations 
causing  the  "S"  shape  curve  in  Fig.  14  are  most  likely  real. 

E.   PEARSON  DIAGRAM  USING  DOS  DATA 

A  plot  of  the  31 ,  B~  values  for  the  fifty  logarithmic  DOS 
data  sets  on  a  Pearson  diagram  is  shown  in  Fig.  15.   No  ex- 
ceedingly large  values  of  3-.  and  $?  were  obtained  from  the 
DOS  distributions  and  no  attempt  was  made  to  identify  any 
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irregularities  in  the  two  distributions  that  had  a  $9  value 
greater  than  4.5.   It  is  readily  apparent  the  DOS  data  is 
closely  grouped  around  the  (0,3)  point  on  the  diagram  and 
compares  most  favorably  to  the  majority  of  the  current-meter 
distributions  in  Fig.  9.   The  Pearson  diagram  has  been  used 
as  an  indication  of  normality  and  as  a  tool  for  general  com- 
parison between  the  two  types  of  data  included  in  this  thesis. 
The  full  utilization  and  subsequent  implications  one  could 
employ  with  regard  to  current-speed  distributions  through 
use  of  the  Pearson  diagram  are  left  to  future  work  in  this 
area. 

F.   GENERAL  BAR  PLOT  COMPARISON  BETWEEN  CURRENT  METER  AND  DOS 
DATA 

Figure  16  is  a  composite  of  Fig.  6  and  Fig.  14  plotted 
together  for  comparison.   It  is  readily  apparent  that  the 
variability  in  difference  values  is  more  extreme  for  current- 
meter  data  than  for  DOS  data.   The  most  likely  reasons  for 
this  is  that  the  DOS  data  are  highly  grouped,  in  general 
contain  only  a  moderate  number  of  observations,  and  those 
observations  that  are  available  have  been  averaged  over  time 
and  space  due  to  the  nature  of  the  recording  technique.   It 
is  possible  that  the  DOS  data  are  a  better  measure  of  the 
statistics  of  current  measurements  made  over  very  long  per- 
iods than  are  the  current-meter  data,  having  gained  more  from 
their  randomized  distribution  over  years  of  time  than  they 
have  lost  from  their  various  known  distortions.   It  would 
seem,  therefore,  that  the  difference  in  variability  has  little 

significance . 

45 


Another  apparent  discrepancy  in  Fig.  16  is  that  it  seems 
the  smooth  curve  through  the  means  of  the  current-meter  dif- 
ference values  is  offset  from  the  DOS  curve  by  about  one 
sigma  unit.   What  probably  has  happened  is  that  the  lower 
tail  of  the  DOS  data  has  been  compressed  towards  the  upper 
tail  by  about  one  sigma  unit.   This  also  accounts  for  the 
fact  that  no  DOS  data  sets  exhibited  low-speed  values  which 
extended  out  to  three  standard  deviations  from  the  mean.   The 
reason  is  that  the  grouping  into  class  intervals  centers  the 
speed  of  the  lowest  class  interval  higher  than  the  several 
low-speed  class  intervals  in  the  current-meter  distribution. 
These  low  speeds  become  relatively  large  deviants  from  the 
mean  after  the  logarithmic  transformation.   It  was  necessary 
to  make  an  ad   hoo    readjustment  of  the  DOS  data  in  the  low- 
speed  end  while  at  the  same  time  setting  a  rigid  boundary  on 
the  high-speed  end.   No  such  adjustments  were  necessary  for 
the  current-meter  data. 

G.   OTHER  POSSIBLE  VARIATIONS  WITHIN  DOS  DATA 

Some  of  the  variability  noted  in  the  current-speed  data 
used  in  this  report  could  have  come  from  differences  in  area 
influences  on  the  data  and  differences  in  seasonal  influences 
on  the  data.   Because  the  DOS  data  was  available  in  a  large 
quantity  covering  both  area  and  time,  an  effort  was  made  to 
check  for  possible  indications  of  these  two  types  of  vari- 
ability using  the  DOS  data  only.   The  procedure  was  to  select 
the  data  and  plot  it  in  the  bar  format  similar  to  Fig.  6. 
The  number  of  distributions  that  could  be  used  from  the 
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fifty  DOS  data  sets  previously  studied  ranged  between  eight 
and  ten  for  each  case  cited  below.   Because  the  data  base 
was  small,  the  plots  generated  were  used  to  provide  possible 
indications  of  differences  without  proceeding  further  with 
significance  tests  or  in-depth  reasoning  for  their  existence. 

1 .  Area  Influence 

Figure  17  is  a  plot  using  data  from  two  separate 
areas,  MS  149-3  and  MS  114-1,  to  check  for  possible  area  in- 
fluence on  current  speeds.   Individual  months  in  each  area 
were  taken  as  separate  data  sets.   The  plot  indicates  the 
surface  current  speeds  in  MS  149-3  are  in  general  more  vari- 
able over  a  year's  period  than  in  MS  114-1.   This  is  not 
too  surprising  since  MS  149-3  is  east  of  Newfoundland  and 
probably  more  susceptible  to  storms  and  current  variations; 
the  area  is  located  in  the  vicinity  where  the  Labrador  and 
Gulf  Stream  current  regimes  generally  mix.   MS  114-1  is  in 
the  mid-Atlantic  east  of  Charleston,  South  Carolina,  where 
active  and  variable  current-speed  conditions  are  not  known 
to  exist.   However,  the  general  shape  of  the  two  curves  is 
about  the  same.   Therefore,  the  indication  is  that  possibly 
an  area  influence  on  surface  current  speeds  exists  affecting 
the  variability  of  the  speeds  but  not  the  general  shape  of 
the  distribution  of  the  logarithmic-speed  curve. 

2 .  Seasonal  Influence 

Figure  18  is  a  plot  using  data  from  two  separate 
seasons  of  the  year,  winter  and  summer.   For  winter. the  month 
of  January  was  selected  and  the  data  randomly  covered  all  MS 
areas  except  MS  148.   For  summer,  the  month  of  July  was  selected 
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and  the  data  covered  the  same  MS  areas  and  quadrants  from 
which  the  January  data  was  taken.   The  plot  indicates  that 
in  the  summer  the  currents  are  in  general  more  variable  in 
magnitude  than  the  winter  currents,  however  the  general 
shape  of  the  curves  is  somewhat  similar.   The  implications 
of  the  variability  noted  cannot  be  readily  related  to  any 
current  regimes.   Since  the  factors  which  create  and  maintain 
ocean  currents  are  numerous  and  sometimes  unpredictable,  an 
in-depth  study  would  be  needed  to  confirm  these  variability 
results  and  then  to  establish  reasons  for  their  existence. 
However,  there  are  indications  that  seasons  do  influence  the 
variability  but  not  the  distribution  of  DOS  current-speed 
data. 
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VI.   CONCLUSIONS 

The  logarithmic-speed  transformation  of  both  current- 
meter  and  altered  DOS  speed  records  produces  distributions 
that  as  a  group  can  be  considered  symmetrically  distributed 
about  their  means.   The  distribution  of  the  mean  values  ap- 
pear log-normal.   However,  the  mean  of  both  types  of  data 
exhibit  a  slight  "S"  shape  systematic  deviation  which  is 
probably  real.   This  systematic  deviation  appears  likely  to 
be  the  result  of  external  influences  on  the  data  which  are 
both  natural  and  man-made.   Indications  are  that  elimination 
of  these  influences  would  allow  the  mean  of  the  logarithmic- 
speed  distributions  of  current  data  to  be  log-normal  with 
a  high  level  of  confidence. 

DOS  data  compares  quite  favorably  to  current-meter  data 
and  could  be  used  to  derive  probability  estimates  for  sur- 
face current  speeds  in  areas  where  no  other  data  is  available. 

The  limits  of  usefulness  of  current-meter  data  appears 
to  extend  from  NDM  values  of  at  least  -3  sigma  to  somewhere 
between  two  and  three  sigma.   For  DOS  data  these  limits  are 
from  at  least  -2  sigma  to  somewhere  between  two  and  three 
sigma.   Extrapolation  beyond  these  limits  is  extremely  un- 
certain and,  with  present  knowledge,  should  only  be  con- 
sidered if  the  consequences  of  a  many-fold  error  in  probability 
are  freely  accepted. 

Indications  are  that  seasonal  and  area  influences  on  the 
current  speeds  exist  but  that  these  influences  are  limited 
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Figure  1.   Location  of  Oregon  State  University  Coastal 
Upwelling  Experiment  Current  Meters 
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Figure  2.   Marsden  Square  Grid  Off  the  East  Coast 
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Figure  3.   Marsden  Square  Grid  System. 
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Figure  6.   Deviation  of  Logarithmic-Speed  Distribution  from 
the  Log-Normal  Distribution.   Mean  and  Range  of 
23  to  29  Moored  Current-Meter  Time-Series  Data 
Sets . 
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Figure  12 
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Empirical  Cumulative  Distribution  Function  (e.c.d.f.) 
for  MS  115-1-10  After  Nine-Tenths  Alteration. 
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Figure  13. 


Probability  Density  Plots  of  the  Logarithmic-Speed 
Distribution  for  Drif t-of -Ship  Data  (a)  MS  116-4-6 
and  (b)  MS  116-3-9. 
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Figure  14.   Deviation  of  Logarithmic-Speed  Distribution  from 
the  Log-Normal  Distribution.   Mean  and  Range  of 
50  Altered  Drift -of -Ship  Current-Data  Records. 
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Figure  16.   Comparison  of  Figure  6  (Solid  Line,  Lower  Bars)  and 
.Figure  14  (Dashed  Line,  Upper  Bars). 
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Figure  17. 


Deviation  of  Logarithmic-Speed  Distribution  from 
the  Log-Normal  Distribution.   Comparison  of  the 
Mean  and  Range  for  Two  Separate  Areas;  MS  114-1 
(Dashed  Line,  Upper  Bars)  and  MS  149-3  (Solid 
Line,  Lower  Bars). 
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Figure  18.   Deviation  of  Logarithmic-Speed  Distribution  from 
the  Log-Normal  Distribution.   Comparison  of  the 
Mean  and  Range  for  Two  Separate  Seasons;  January 
(Dashed  Line,  Upper  Bars)  and  July  (Solid  Line, 
Lower  Bars) . 
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